13. Minimizing Error Functions
Minimizing Error Functions
INSTRUCTOR NOTE:
NOTE: From 2:22 onward, the slide title should say "Mean Absolute Error".
Development of the derivative of the error function
Notice that we've defined the squared error to be
Error = \frac{1}{2} (y - \hat{y})^2.
Also, we've defined the prediction to be
\hat{y} = w_1 x + w_2.
So to calculate the derivative of the Error with respect to
w_1
, we simply use the chain rule:
\frac{\partial}{\partial w_1} Error = \frac{\partial Error}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial w_i}.
The first factor of the right hand side is the derivative of the Error with respect to the prediction
\hat{y}, which is
-(y-\hat{y}).
The second factor is the derivative of the prediction with respect to
w_1, which is simply
x.
Therefore, the derivative is
Exercise
Calculate the derivative of the Error with respect to
w_2
and verify that it is precisely
-(y-\hat{y}).